In this notebook, we observe the annotation results obtained from SingleR; Those results were obtained in R in the R markdown named: SingleR_K.Rmd/SingleR_K.html
In this exercice we first predict labels for the PBMC3k datasets using:
Besca annotated datasets:
(a) Kotliarov (b) Granja (c) Kotliarov + Granja
Finally we also show briefly prediction on Granja with Kotliarov and Kotliarov with Granja.
PMBC3k predition by singleR will be compared with the auto-annot prediction.
import besca as bc
import scanpy as sc
import pandas as pd
We load the dataset predicted with Auto-Annot
pbmc3K_prediction = sc.read_h5ad("adata_predicted_16072020.h5ad")
In this pbm3k data are stored multiples scores and information. Leiden is the leiden clustering on the whole datasets post classical filtering. dblabel is the results of the sig-annot procedure on the whole dataset also after reclustering around T-cells/NK-cells for a finer grain annotation. Finally auto_annot is the results of the auto-annot procedure using Granja and Kotliarov datasets as training sets.
sc.pl.umap(pbmc3K_prediction, color = ["leiden", "dblabel", "auto_annot"], ncols = 1)
bc.tl.auto_annot.report(pbmc3K_prediction, celltype = 'dblabel', method = 'autoannot', analysis_name= 'autoannot',
train_datasets='pmbc3k', test_dataset='GranjaAndKV', merge = True, use_raw=False)
besca.tl.auto_annot.report(...) is deprecated( besca > 2.3); please use besca.tl.report(...) WARNING: saving figure to file figures/umap.ondata_autoannot.png
WARNING: saving figure to file figures/umap.autoannot.png
Confusion matrix, without normalization Normalized confusion matrix
We load the prediction obtained with SingleR and upload the said results in our h5ad object.
SingleR predictions were obtained in R, please see SingleR_K.html / SingleR_K.Rmd
read_Monaco_prediction= pd.read_csv("pred.pmbc3kwithMonaco.csv", index_col= 0)
read_Monaco_prediction.rename(columns={'labels': 'Monaco_label'},
inplace=True)
read_KV_prediction = pd.read_csv("predictedKV.csv", index_col= 0)
read_KV_prediction.rename(columns={'labels': 'KV_dblabel'},
inplace=True)
read_Granja_prediction = pd.read_csv("predictedGranja.csv", index_col= 0)
read_Granja_prediction.rename(columns={'labels': 'Granja_dblabel'},
inplace=True)
read_KV_Granja_prediction = pd.read_csv("combined_prediction.csv", index_col= 0)
read_KV_Granja_prediction.rename(columns={'labels': 'KVandGranja_dblabel'},
inplace=True)
We need to map back the labels to dblabels using the dblabel nomenclature (described in besca in CellTypes_v1.tsv) when using the singlecell datasets provided by SingleR (in the celldex package)
We do an exact mapping (if a subtype is not find, we go to large one). But we also do an exact mapping, ie. keeping as if celltype not supported now by besca
dblabel_Non_exactmapping = {'Central memory CD8 T cells': 'central memory CD8-positive, alpha-beta T cell',
'Classical monocytes': 'classical monocytes',
'Effector memory CD8 T cells': 'effector memory CD8-positive, alpha-beta T cell',
'Exhausted B cells': 'exhausted B cell',
'Follicular helper T cells': 'CD4-positive helper T cell',
'Intermediate monocytes' : 'intermediate monocyte', # THIS IS NOT YET A DBLABEL BUT EFO IS CL:0002393
'Low-density basophils' : 'mature basophil', # THIS IS NOT YET A DBLABEL BUT EFO annotated was :0000043
'Low-density neutrophils' : 'mature neutrophil', # true mapping is 'mature neutrophil' , # THIS IS NOT YET A DBLABEL BUT EFO annotated was CL:0000096
'MAIT cells' : 'MAIT',
'Myeloid dendritic cells' : 'myeloid dendritic cell',
'Naive B cells' : 'naive B cell' ,
'Naive CD4 T cells' : 'naive thymus-derived CD4-positive, alpha-beta T cell',
'Naive CD8 T cells': 'naive thymus-derived CD8-positive, alpha-beta T cell',
'Natural killer cells' : 'cytotoxic CD56-dim natural killer cell',
'Non classical monocytes': 'non-classical monocyte',
'Non-switched memory B cells' : 'Non-switched memory B cells', # Non switched is a subgroup of memory Bcells CL:00000970)
'Plasmablasts' : 'Plasmablasts', # We do not have this subgroup either,http://purl.obolibrary.org/obo/CL_0000980
'Plasmacytoid dendritic cells':'plasmacytoid dendritic cell',
'Progenitor cells': 'Progenitor cells' ,
'T regulatory cells' : 'regulatory T cell' ,
'Terminal effector CD4 T cells' : 'effector CD4-positive, alpha-beta T cell',
'Th1 cells': 'T-helper 1 cell'
}
dblabel_mapping = {'Central memory CD8 T cells': 'central memory CD8-positive, alpha-beta T cell',
'Classical monocytes': 'classical monocytes',
'Effector memory CD8 T cells': 'effector memory CD8-positive, alpha-beta T cell',
'Exhausted B cells': 'exhausted B cell',
'Follicular helper T cells': 'CD4-positive helper T cell',
'Intermediate monocytes' : 'intermediate monocyte', # THIS IS NOT YET A DBLABEL BUT EFO IS CL:0002393
'Low-density basophils' : 'basophil', # 'mature basophil', # THIS IS NOT YET A DBLABEL BUT EFO annotated was :0000043
'Low-density neutrophils' : 'neutrophil', # true mapping is 'mature neutrophil' , # THIS IS NOT YET A DBLABEL BUT EFO annotated was CL:0000096
'MAIT cells' : 'MAIT',
'Myeloid dendritic cells' : 'myeloid dendritic cell',
'Naive B cells' : 'naive B cell' ,
'Naive CD4 T cells' : 'naive thymus-derived CD4-positive, alpha-beta T cell',
'Naive CD8 T cells': 'naive thymus-derived CD8-positive, alpha-beta T cell',
'Natural killer cells' : 'cytotoxic CD56-dim natural killer cell',
'Non classical monocytes': 'non-classical monocyte',
'Non-switched memory B cells' : 'memory B cell', # Non switched is a subgroup of memory Bcells CL:00000970)
'Plasmablasts' : 'B cell', # We do not have this subgroup either,http://purl.obolibrary.org/obo/CL_0000980
'Plasmacytoid dendritic cells':'plasmacytoid dendritic cell',
'Progenitor cells': 'hematopoietic multipotent progenitor cell' ,
'T regulatory cells' : 'regulatory T cell' ,
'Terminal effector CD4 T cells' : 'effector CD4-positive, alpha-beta T cell',
'Th1 cells': 'T-helper 1 cell'}
read_Monaco_prediction['dblabel_monaco'] = read_Monaco_prediction.Monaco_label.copy().map(dblabel_mapping)
read_Monaco_prediction['label_monaco'] = read_Monaco_prediction.Monaco_label.copy().map(dblabel_Non_exactmapping)
For datasets that were annotated with Besca (ie. Granja and Kotliarov), labels are consistents already, and comparisons is straight forward.
obs_bis = pbmc3K_prediction.obs.join(read_KV_prediction["KV_dblabel"], how = "left")
obs_bis = obs_bis.join(read_Granja_prediction["Granja_dblabel"], how = "left")
obs_bis = obs_bis.join(read_KV_Granja_prediction["KVandGranja_dblabel"], how = "left")
obs_bis = obs_bis.join(read_Monaco_prediction["Monaco_label"], how = "left") # REMEMBER THIS IOS BEFORE MAPPING DBLABEL
obs_bis = obs_bis.join(read_Monaco_prediction["label_monaco"], how = "left")
obs_bis = obs_bis.join(read_Monaco_prediction["dblabel_monaco"], how = "left")
pbmc3K_prediction.obs = obs_bis.copy()
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'Monaco_label'], ncols= 2)
... storing 'KV_dblabel' as categorical ... storing 'Granja_dblabel' as categorical ... storing 'KVandGranja_dblabel' as categorical ... storing 'Monaco_label' as categorical ... storing 'label_monaco' as categorical ... storing 'dblabel_monaco' as categorical
#sc.pl.umap( pbmc3K_prediction, color = [ 'dblabel', "auto_annot", "KV_dblabel","Granja_dblabel","KVandGranja_dblabel" ], ncols= 2)
We fix the palette to keep the same colors for the UMAPs celltypes
values_ = set( pd.Categorical(pbmc3K_prediction.obs.dblabel).categories.tolist()+\
pd.Categorical(pbmc3K_prediction.obs.KV_dblabel).categories.tolist()+\
pd.Categorical(pbmc3K_prediction.obs.Granja_dblabel).categories.tolist()+\
pd.Categorical(pbmc3K_prediction.obs.KVandGranja_dblabel).categories.tolist()+\
pd.Categorical(pbmc3K_prediction.obs.dblabel_monaco).categories.tolist()+\
pd.Categorical(pbmc3K_prediction.obs.label_monaco).categories.tolist()+\
pd.Categorical(pbmc3K_prediction.obs.auto_annot).categories.tolist())
import seaborn as sns
palette_ = sns.color_palette("tab10", len( values_) )
palette_colors = dict(zip(values_, palette_))
#palette_colors
bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'dblabel', checkColors = True)
bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'auto_annot', checkColors = True)
bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'KV_dblabel', checkColors = True)
bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'Granja_dblabel', checkColors = True)
bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'KVandGranja_dblabel', checkColors = True)
#bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'label_monaco', checkColors = True)
bc.pl.update_qualitative_palette( pbmc3K_prediction, palette_colors, group = 'dblabel_monaco', checkColors = True)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'dblabel'], ncols= 2)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'auto_annot'], ncols= 2)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'KV_dblabel'], ncols= 2)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'Granja_dblabel'], ncols= 2)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'KVandGranja_dblabel'], ncols= 2)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'label_monaco'], ncols= 2)
sc.pl.umap( pbmc3K_prediction, color = ["leiden", 'dblabel_monaco'], ncols= 2)
#annot_comparison in adata_pred.obs_keys()
adata_pred = pbmc3K_prediction
method = "singleR"
merge = False
remove_nonshared = True
use_raw = True
genes_to_use = ""
clustering = "leiden"
asymmetric_matrix= True
fig = bc.tl.report(adata_pred,
celltype = "dblabel", method = "SingleR",
analysis_name = "PMBCwithMonaco",
train_datasets= "Monaco", test_dataset= "pbmc3k", name_report= "SingleR",
merge = merge, name_prediction = 'dblabel_monaco',
use_raw = use_raw, genes_to_use = genes_to_use,
remove_nonshared = False, clustering =clustering, asymmetric_matrix = True)
#fig.show()
WARNING: saving figure to file figures/umap.ondata_PMBCwithMonaco.png WARNING: saving figure to file figures/umap.PMBCwithMonaco.png Confusion matrix, without normalization Normalized confusion matrix
fig = bc.tl.report(adata_pred,
celltype = "dblabel", method = "auto_annot",
analysis_name = "GranjaKV_",
train_datasets= "GranjaKotliarov", test_dataset= "pbmc3k", name_report= "auto_annot_",
merge = merge, name_prediction = 'auto_annot',
use_raw = use_raw, genes_to_use = genes_to_use,
remove_nonshared = False, clustering =clustering, asymmetric_matrix = True)
#fig.show()
WARNING: saving figure to file figures/umap.ondata_GranjaKV_.png WARNING: saving figure to file figures/umap.GranjaKV_.png Confusion matrix, without normalization Normalized confusion matrix
fig = bc.tl.report(adata_pred, celltype = "dblabel", method = "SingleR",
analysis_name = "GranjaPred_dblabel", name_report='SingleR_',
train_datasets= "Granja", test_dataset= "pbmc3k",
merge = merge, name_prediction = 'Granja_dblabel',
use_raw = use_raw, genes_to_use = genes_to_use,
remove_nonshared = False, clustering =clustering, asymmetric_matrix = True)
#fig.show()
WARNING: saving figure to file figures/umap.ondata_GranjaPred_dblabel.png WARNING: saving figure to file figures/umap.GranjaPred_dblabel.png Confusion matrix, without normalization Normalized confusion matrix
fig = bc.tl.report(adata_pred, celltype = "dblabel", method = "SingleR",
analysis_name = "KotliarovPred_dblabel",name_report='SingleR',
train_datasets= "Granja", test_dataset= "pbmc3k",
merge = merge, name_prediction = 'KV_dblabel',
use_raw = use_raw, genes_to_use = genes_to_use,
remove_nonshared = False, clustering =clustering, asymmetric_matrix = True)
#fig.show()
WARNING: saving figure to file figures/umap.ondata_KotliarovPred_dblabel.png WARNING: saving figure to file figures/umap.KotliarovPred_dblabel.png Confusion matrix, without normalization
fig = bc.tl.report(adata_pred, celltype = "dblabel", method = "SingleR",
analysis_name = "KVandGranja_dblabel",
train_datasets= "Granja", test_dataset= "pbmc3k",
merge = merge, name_prediction = 'KVandGranja_dblabel', name_report='SingleR_',
use_raw = use_raw, genes_to_use = genes_to_use,
remove_nonshared = False, clustering =clustering, asymmetric_matrix = True)
fig.show()
fig = bc.tl.report(adata_pred, celltype = "auto_annot", method = "SingleR", analysis_name = "KVandGranja_autoAnnot",
train_datasets= "Granja", test_dataset= "pbmc3k", name_report='SingleR_vsAutoAnnot',
merge = merge, name_prediction = 'KVandGranja_dblabel',
use_raw = use_raw, genes_to_use = genes_to_use,
remove_nonshared = False, clustering =clustering, asymmetric_matrix = True)
#fig.show()
Besca report generate overall models. We retrieved the overall F1 and accuracy score for all reports generated in order to compare those values.
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(".") if isfile(join(".", f)) and '_report_' in f]
f1_values = pd.DataFrame(columns=['accuracy', 'f1' ] , index=onlyfiles)
for report in onlyfiles :
get_info = open(report, "r").readlines()[3]
dd = get_info.split(',')
f1_values.loc[report] =[dd[1], dd[3].rstrip('\n')]
f1_values
We can see that prediction with Monaco datasets is problematic; which was visible in the UMAP and the generated reports (overall most cells were predicted as monocytes). One note here is due to installation issue, normalization of the data was done using Seurat not by Scuttle which was not available.
Overall, Single R compare with Auto-annot prediction are close (F1 = 0.613); auto-annot report is slightly closer to sig-annot prediction on pmbc3k (training with Granja+Kotliarov). (AutoAnnot F1 = 0.67; SingleR F1 = 0.65).
Granja predicted with Kotliarov Kotliarov predicted with Granja
Granja = bc.datasets.Granja2019_processed()
Kotliarov = bc.datasets.Kotliarov2020_processed()
read_Kotliarovprediction_TrainGranja= pd.read_csv("pred.KV_trainGranja.csv", index_col= 0)
read_Kotliarovprediction_TrainGranja.rename(columns={'labels': 'Granja_pred'},
inplace=True)
obs_bis = Kotliarov.obs.join(read_Kotliarovprediction_TrainGranja["Granja_pred"], how = "left")
Kotliarov.obs = obs_bis.copy()
read_GranjaPred_TrainKotliarov = pd.read_csv("pred.Granja_trainKV.csv", index_col= 0)
read_GranjaPred_TrainKotliarov.rename(columns={'labels': 'KV_pred'},
inplace=True)
obs_bis = Granja.obs.join(read_GranjaPred_TrainKotliarov["KV_pred"], how = "left")
Granja.obs = obs_bis.copy()
bc.tl.report(Kotliarov,
celltype = "celltype3", method = "SingleR",
analysis_name = "KotliarovTest_TrainedGranja",
train_datasets= "Kotliarov", test_dataset= "Granja", name_report= "SingleR",
merge = merge, name_prediction = 'Granja_pred',
use_raw = True, genes_to_use = 'all',
remove_nonshared = False, clustering = 'leiden', asymmetric_matrix = True)
bc.tl.report(Granja,
celltype = "celltype3", method = "SingleR",
analysis_name = "GranjaTest_TrainedKotliarov",
train_datasets= "Granja", test_dataset= "Kotliarov", name_report= "SingleR",
merge = merge, name_prediction = 'KV_pred',
use_raw = True, genes_to_use = 'all',
remove_nonshared = False, clustering = 'leiden', asymmetric_matrix = True)
! jupyter nbconvert --to html Single_R_comparison.ipynb
from sinfo import sinfo
sinfo()